The Condorcet project: indexing scientific documents
نویسندگان
چکیده
This paper presents Condorcet, a domain-specific prototype indexing system for tens of thousands of documents covering two scientific domains: engineering ceramics and epilepsy. The development corpus consists of 800 documents taken from one year volumes of two scientific journals. Condorcet takes a controlled-term approach to. The index process makes intensive use of linguistic knowledge. The paper discusses how principle-based natural language processing strategies and structured knowledge sources are used in a semi-automatic, controlled-term
منابع مشابه
Combining linguistic and knowledge-based engineering for information retrieval and information extraction
Controlled-term indexing (the method of choice for multimedia collections and still very popular for purely textual material), appears an expensive solution because it takes huge resources and manual indexing. It is not possible, however, to perform a well-founded asessment of various approaches to information retrieval. We discuss ways to improve controlled-term indexing and illustrate these b...
متن کاملLarge-Scale Semantic Indexing of Biomedical Publications
Automated annotation of scientific publications in real-world digital libraries requires dealing with challenges such as large number of concepts and training examples, multi-label training examples and hierarchical structure of concepts. BioASQ is a European project that contributes a large-scale biomedical publications corpus for working on these challenges. This paper documents the participa...
متن کاملQCT and SF services in Torii: Human Evaluations of Documents Benefit to the Community
This paper describes two services of the Torii portal dedicated to the High Energy Physics research community, and developed within the context of the TIPS European project. These services both relate to the reuse of evaluations performed by humans on scientific publications. The first one, called QCT (Quality Control Tools) aims at collecting human detailed evaluations of documents in order to...
متن کاملA Survey of Indexing and Retrieval of Multimodal Documents: Text and Images
A document conveys information using multiple modalities, including text, layout/style and images. For example, journal articles usually have figures to illustrate experimental results, and the title in a journal article usually has a different font size than the body text. Indexing and retrieval using only text is the traditional way of IR (Information Retrieval). With the development of the I...
متن کاملConcept Mining for Indexing Medical Literature
This article addresses the task of mining concepts from biomedical literature to index and search through this documents base. This research takes place within the Telemakus project, which has for goal to support and facilitate the knowledge discovery process by providing retrieval, visual, and interaction tools to mine and map research findings from research literature in the field of aging. A...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007